Skip to content

Conversation

@constantinius
Copy link
Contributor

Description

Previously we only extracted only text parts were extracted. Now the full range of possibilities are covered.

Issues

Closes https://linear.app/getsentry/issue/TET-1638/redact-images-google-genai

@constantinius constantinius requested a review from a team as a code owner January 5, 2026 11:16
@linear
Copy link

linear bot commented Jan 5, 2026

Base automatically changed from constantinius/fix/redact-message-parts-type-blob to master January 13, 2026 09:56
@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

Ai

  • feat(ai): add cache writes for gen_ai by shellmayr in #5319
  • feat(ai): add parse_data_uri function to parse a data URI by constantinius in #5311

Other

  • feat(asyncio): Add on-demand way to enable AsyncioIntegration by sentrivana in #5288
  • feat(openai-agents): Inject propagation headers for HostedMCPTool by alexander-alderman-webb in #5297
  • feat: Support array types for logs and metrics attributes by alexander-alderman-webb in #5314

Bug Fixes 🐛

Integrations

  • fix(integrations): google-genai: reworked gen_ai.request.messages extraction from parameters by constantinius in #5275
  • fix(integrations): pydantic-ai: properly format binary input message parts to be conformant with the gen_ai.request.messages structure by constantinius in #5251
  • fix(integrations): Anthropic: add content transformation for images and documents by constantinius in #5276
  • fix(integrations): langchain add multimodal content transformation functions for images, audio, and files by constantinius in #5278

Litellm

  • fix(litellm): fix gen_ai.request.messages to be as expected by constantinius in #5255
  • fix(litellm): Guard against module shadowing by alexander-alderman-webb in #5249

Other

  • fix(ai): redact message parts content of type blob by constantinius in #5243
  • fix(clickhouse): Guard against module shadowing by alexander-alderman-webb in #5250
  • fix(gql): Revert signature change of patched gql.Client.execute by alexander-alderman-webb in #5289
  • fix(grpc): Derive interception state from channel fields by alexander-alderman-webb in #5302
  • fix(pure-eval): Guard against module shadowing by alexander-alderman-webb in #5252
  • fix(ray): Guard against module shadowing by alexander-alderman-webb in #5254
  • fix(threading): Handle channels shadowing by sentrivana in #5299
  • fix(typer): Guard against module shadowing by alexander-alderman-webb in #5253
  • fix: Stop suppressing exception chains in AI integrations by alexander-alderman-webb in #5309
  • fix: Send client reports for span recorder overflow by sentrivana in #5310

Documentation 📚

  • docs(metrics): Remove experimental notice by alexander-alderman-webb in #5304
  • docs: Update Python versions banner in README by sentrivana in #5287

Internal Changes 🔧

Release

  • ci(release): Bump Craft version to fix issues by BYK in #5305
  • ci(release): Switch from action-prepare-release to Craft by BYK in #5290

Other

  • chore(gen_ai): add auto-enablement for google genai by shellmayr in #5295
  • ci: 🤖 Update test matrix with new releases (01/19) by github-actions in #5330
  • ci: Add periodic AI integration tests by alexander-alderman-webb in #5313
  • chore: Use pull_request_target for changelog preview by BYK in #5323
  • chore: add unlabeled trigger to changelog-preview by BYK in #5315
  • chore: Add type for metric units by sentrivana in #5312
  • ci: Update tox and handle generic classifiers by sentrivana in #5306

🤖 This preview updates automatically when you update the PR.

Comment on lines 387 to 397
if isinstance(function_response, dict):
tool_call_id = function_response.get("id")
tool_name = function_response.get("name")
response_dict = function_response.get("response") or {}
# Prefer "output" key if present, otherwise use entire response
output = response_dict.get("output", response_dict)
else:
# FunctionResponse object
tool_call_id = getattr(function_response, "id", None)
tool_name = getattr(function_response, "name", None)
response_obj = getattr(function_response, "response", None) or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this .get() vs getattr pattern a lot in our AI integrations. Feels like introducing a helper function that would try both at once would potentially deduplicate a lot of code.

Not something that needs to be done in this PR, mostly thinking out loud.

…AI messages

Add transform_content_part() and transform_message_content() functions
to standardize content part handling across all AI integrations.

These functions transform various SDK-specific formats (OpenAI, Anthropic,
Google, LangChain) into a unified format:
- blob: base64-encoded binary data
- uri: URL references (including file URIs)
- file: file ID references

Also adds get_modality_from_mime_type() helper to infer content modality
(image/audio/video/document) from MIME types.
…rmats

Replace inline_data and file_data dict handling with the shared
transform_content_part function. Keep Google SDK object handling
and PIL.Image support local since those are Google-specific.
Add dedicated transform functions for each AI SDK:
- transform_openai_content_part() for OpenAI/LiteLLM image_url format
- transform_anthropic_content_part() for Anthropic image/document format
- transform_google_content_part() for Google GenAI inline_data/file_data
- transform_generic_content_part() for LangChain-style generic format

Refactor transform_content_part() to be a heuristic dispatcher that
detects the format and delegates to the appropriate specific function.

This allows integrations to use the specific function directly for
better performance and clarity, while maintaining backward compatibility
through the dispatcher for frameworks that can receive any format.

Added 38 new unit tests for the SDK-specific functions.
Replace generic transform_content_part with the Google-specific
transform_google_content_part function for better performance and
clarity since we know Google GenAI uses inline_data and file_data formats.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@constantinius constantinius enabled auto-merge (squash) January 19, 2026 09:55
Comment on lines +343 to +347
return {
"type": "blob",
"mime_type": mime_type,
"content": BLOB_DATA_SUBSTITUTE,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: When processing object-based inline_data or PIL.Image objects, the returned blob dictionary is missing the modality field, creating an inconsistent data structure compared to other data types.
Severity: HIGH

Suggested Fix

In _extract_part_content, update the logic for handling object-based inline_data and PIL.Image objects. Add the modality field to the returned dictionary by calling get_modality_from_mime_type(mime_type), similar to how file_data is handled. This will ensure all blob data structures are consistent.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: sentry_sdk/integrations/google_genai/utils.py#L343-L347

Potential issue: The function `_extract_part_content` creates an inconsistent data
structure for blob data. When handling object-based `Part` objects with `inline_data`
containing bytes (lines 337-348) or `PIL.Image` objects (lines 418-422), the returned
dictionary omits the `modality` field. However, when processing dictionary-based
`inline_data` or `file_data`, the `modality` field is correctly included using the
`get_modality_from_mime_type` helper. This inconsistency can lead to downstream
processing errors if other parts of the system expect a standardized blob format that
always includes `modality`.

Did we get this right? 👍 / 👎 to inform future reviews.

@constantinius constantinius merged commit 749d8e5 into master Jan 19, 2026
154 checks passed
@constantinius constantinius deleted the constantinius/fix/integrations/google-genai-report-image-inputs branch January 19, 2026 12:17
@constantinius constantinius restored the constantinius/fix/integrations/google-genai-report-image-inputs branch January 19, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants